On the Combining Significances 



Sergey Bityukov, Nikolai Krasnikov, Alexander Nikitenko 
(e-mail: Serguei.Bitioukov@cern.ch) 

Abstract 

We present the statistical approach to the combining of signal significances. 

1 What we keep in mind as a significance ? 

The measure of the excess of observed (or expected) events in the experiment above the 
background often is named the signal significance. According to ref. [1] "Common practice 
is to express the significance of an enhancement by quoting the number of standard 
deviations" . 

Let us distinguish the significances of two classes: 

• "the initial (or internal) significance" S of an experiment is the expression of two 
parameters of the experiment - expected number of signal events N s and expected 
number of background events iV& in the given experiment ( "the initial significance" 
can be considered as a potential for discovery in planned experiments [2]), 

• "the observed significance" S is the expression of observed number of events N f, s 
and of the expected background Nb [3]. 

The first one is a parameter of the experiment. We suppose that it is constant for given 
integral luminosity. The second one is a realization of a random variable. The observed 
significance is considered as an estimator of the initial significance. 

Why we can consider the observed significance as the realization of a random variable? 

The observed number of events N a b s is the realization of the random variable which obeys 
the Poisson distribution, hence the observed significance S also is the realization of the 
random variable as a function which depends from Nobs- 
It is easy to show. Let us take, as an example, the "counting" [4j significance S c n [2] and 
the significance S c p [5]. 

The observed significance S c i2 is expressed by formula 



S c 12 = 2-(^N obs - y/N b ). (1) 

The significance S c p is the probability from Poisson distribution with mean Nb to observe 
equal or greater than N Q b s events, converted to equivalent number of sigmas of a Gaussian 
distribution, i.e. 
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We use the method which allows to connect the magnitude of "the observed significance" 
with the confidence density [61 [7J of the parameter "the initial significance" . This method 
was applied in many studies [SJ H]. We carried out the uniform scanning of initial signif- 
icance S c i2 and S c p, varying S cl2 from value S&2 = 1 to value S c i 2 = 16 using step size 
0.075 and varying S c p from value S c p = to value S c p = 6.2 using step size 0.031. By 
playing with the two Poisson distributions (with parameters N s and Nf,) and using 30000 
trials for each value of S c u and S c p we used the RNPSSN function (CERNLIB [10]) to 
construct the conditional distribution of the probability (the confidence density) of the 
production of the observed value of significance S cl2 or S c p by the initial significance S c i2 
or S c p, correspondingly. We assume that an integral luminosity of the experiment is a 
constant N s + A^. The parameters N s and Nf, are chosen in accordance with the given 
initial significance S c \2 or S c p, the realization N b s is a sum of realizations N s and Nf, of 
two random variables with parameters N s and Nf,, correspondingly. 

In Fig.l the distributions of S c ±2 of several values of initial significance S c \2 with the 
given integral luminosity N s + Nf, = 70 are shown. As seen, the observed distributions of 
significance is similar to the distributions of the realizations of normal distributed random 
variable with variance which close to 1. The distribution of the observed significance S c i2 
versus the initial significance S c i2 (Fig. 2) shows the result of the full scanning. 
The normal distributions with a fixed variance are statistically self-dual distributions [7] . 
It means that the confidence density of the parameter "initial significance" S has the 
same distribution as the random variable which produced a realization "the observed 
significance" S. The several distributions of the probability of the initial significances 
Sci2 to produce the observed values of S c \2 are presented in Fig. 3. These figures clearly 
shows that the observed significance S c i 2 is an estimator of the initial significance S c i2- 
The distribution presented in Fig. 4 shows the result of the full scanning in the case of the 
observed significance S c p and the initial significance S c p. 

The error of these estimators with a good accuracy obeys the standard normal distribution 
(variance equals to 1). It can be confirmed by the using of the Eqs.1-2 for pure background. 
The results of the simulation of the signal absence (3000000 trials) are shown in Fig. 5 (for 
the estimator S c i 2 ) and in Fig.6 (for the estimator S c p). 

Statement 1: The observed significance (the case of the Poisson flows of events) 
is a realization of the random variable which can be approximated by 
normal distribution with variance close to 1 . 

2 What is the Combining Significance? 

The Statement 1 allows us to determine the combinations of the several partial signif- 
icances Si as combinations of independent normal distributed random variables by the 
simple way. 
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Figure 1: The observed significances <S c i2 for the case N s + N b — 70. 
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Ns + Nb = 70, significance scan from S = 1 up to S = 1 6 
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Figure 2: The distribution of observed significance S c i 2 versus the initial significance S c i 2 - 
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Figure 3: The distributions of the initial significances S c \i (confidence densities) for the 
case N s + N b = 70. 
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Let us define the observed sum S sum of partial significances and the observed combining 
significance S com b for the n observed partial significances Si with variances var(Si): 

n n 

Ssum = ^2 Si, var(S sum ) = S ^var(S i ) ) (3) 

i=l i=l 
^comb i 

var(S sum ) 



Statement 2: The ratio of the sum of the several observed partial significances 
and the standard deviation of this sum is the observed combining significance 
of several partial significances. 

In our case of Poisson flows of events the variances of the considered significances close 
to 1. It means that the formula (Eq.4) is approximated by the formula 

C U SUm /_N 

*Jcomb ~ i — ■ W ) 

>n 



It also can be shown by Monte Carlo. Let us generate the observation of the significances 
S c i2 for four experiments with different parameters iV& and iV s simultaneously. The results 
of this simulation (30000 trials) for each experiment are presented in Fig. 7. The distri- 
bution of the sums of four observed significances of experiments in each trial is shown in 
Fig. 8 (top). Correspondingly, the Fig.8 (bottom) presents the distribution of these sums 
divided by y/A in each trials, i.e. the distribution of the observed combined significances. 
This property is correct also for significance S c p. 



3 Conclusion 

The initial significance is a parameter of the given measurement. The observed signifi- 
cance is a realization of the random variable. Also the observed significance is the esti- 
mator of the initial significance. It means that we must consider the combinations of the 
significances as the combinations of the random variables with corresponding estimators. 
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1 Note the additivity of observed combined significances is not conserved. We must take into account 
the number of partial significances in each observed combined significance for performance of the Eq.3. 
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Figure 7: The distributions of the observed significances S c i2 for four different experi- 
ments. 
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Figure 8: The distribution of the sum of observed significances in different experiments 
for each trials (top). The distribution of the normalized sums of observed significances 
(bottom). 
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